1 Research Question

The aim of this analysis is to investigate diabetes prevalence over time and by country/region. The purpose is to identify countries and years with high diabetes prevalence.

2 Dataset Introduction

The dataset ‘DIABETES evolution of diabetes over time’ is a global dataset of diabetes prevelance from the years 1980 to 2014 and contains a total of 14,000 observations and 7 variables:

Table 1 below shows the first six observations of the full dataset.

# Read in Data 

data_full <- read_csv("Data/Diabetes_data.csv")
## Rows: 14000 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Country/Region/World, ISO, Sex
## dbl (4): Year, Age-standardised diabetes prevalence, Lower 95% uncertainty i...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data_full_head <- head(data_full)

kable(data_full_head, 
             caption = "Table 1: First Six Observations of the Full Diabetes Dataset",
             digits = 2)
Table 2.1: Table 1: First Six Observations of the Full Diabetes Dataset
Country/Region/World ISO Sex Year Age-standardised diabetes prevalence Lower 95% uncertainty interval Upper 95% uncertainty interval
Afghanistan AFG Men 1980 0.04 0.02 0.09
Afghanistan AFG Men 1981 0.05 0.02 0.09
Afghanistan AFG Men 1982 0.05 0.02 0.09
Afghanistan AFG Men 1983 0.05 0.02 0.09
Afghanistan AFG Men 1984 0.05 0.02 0.09
Afghanistan AFG Men 1985 0.05 0.02 0.09

3 Dataset Description

The full dataset was reduced to 1000 observations through a random generation of row numbers. The variable “ISO” was removed as it was not necessary for analysis.** Figure 1** below shows the code used to tidy the full dataset into the reduced dataset.

include_graphics("Image/code_screenshot.png")
Figure 1: Code Screenshot of Data Tidying

Figure 3.1: Figure 1: Code Screenshot of Data Tidying

Using the function str() the first 2 rows of the data is displayed to show the type of variables in the data set (numeric, character/factor etc.). The assessment requires a maximum of 5 variables, but both the “lower_95” and “upper_95” were kept as they work together.

head_data_2 <- head(data,2)
str(head_data_2)
## tibble [2 × 6] (S3: tbl_df/tbl/data.frame)
##  $ Country/Region/World: chr [1:2] "Togo" "Canada"
##  $ Sex                 : chr [1:2] "Women" "Men"
##  $ Year                : num [1:2] 2008 1987
##  $ diabetes_prevalence : num [1:2] 0.0603 0.0529
##  $ lower_95            : num [1:2] 0.0389 0.0312
##  $ upper_95            : num [1:2] 0.0874 0.082

4 Data Summary

Two summary statistics were calculated for diabetes prevalence by “Year”. Table 2 shows the results of the summary statistics.

data_summary <- data %>%
  group_by(Year) %>%
  summarise(mean_diabetes = mean(diabetes_prevalence), 
            sd_diabetes = sd(diabetes_prevalence), 
            mean_upper95 = mean(upper_95), 
            sd_upper95 = sd(upper_95), 
            mean_lower95 = sd(lower_95), 
            sd_lower95 = sd(lower_95))
tail_data_summary <- tail(data_summary, 10)

kable(tail_data_summary, 
             caption = "Table 2: Mean and Standard Deviation of Diabetes Prevalence by Year (First 10 Rows)",
             digits = 3, 
      row_number(10))
Table 4.1: Table 2: Mean and Standard Deviation of Diabetes Prevalence by Year (First 10 Rows)
Year mean_diabetes sd_diabetes mean_upper95 sd_upper95 mean_lower95 sd_lower95
2005 0.091 0.053 0.126 0.065 0.042 0.042
2006 0.086 0.034 0.121 0.043 0.026 0.026
2007 0.093 0.053 0.130 0.068 0.041 0.041
2008 0.109 0.057 0.150 0.072 0.043 0.043
2009 0.102 0.063 0.144 0.081 0.047 0.047
2010 0.109 0.064 0.155 0.083 0.047 0.047
2011 0.091 0.052 0.136 0.067 0.039 0.039
2012 0.093 0.048 0.141 0.065 0.033 0.033
2013 0.100 0.050 0.157 0.069 0.033 0.033
2014 0.097 0.037 0.163 0.053 0.024 0.024

From Table 2 we can see an increasing trend in mean diabetes prevalence from 2005 to 2014. 2009 had the highest mean diabetes prevalence at 11.1% from the period 2005 to 2014, but also the highest standard deviation.

5 Visualisations

A figure was created using the ggplot2 R package and the option geom_point(). This is displayed in Figure 2:

Figure_2 <- ggplot(data = data_summary, aes(x = Year, y = mean_diabetes)) + 
  geom_point(alpha = 0.7) + 
  labs(title = "Figure 2: Mean Diabetes Prevalence Increases Over Time", 
       caption = "geom_smooth(` using method = 'loess' and formula = 'y ~ x'",
       subtitle = "Red Bars Represent Standard Deviation") + 
  xlab("Year") + 
  ylab("Mean Diabetes Prevalence") + 
  theme_minimal() + 
  geom_smooth() + 
  geom_errorbar(aes(ymin=mean_diabetes-sd_diabetes, ymax=mean_diabetes+sd_diabetes), colour="red", alpha=0.3)

ggplotly(Figure_2)

5.0.1 Figure 2 Summary

  • There is a slight increase in mean diabetes prevalence from 1980 to 2014
  • The standard deviation bars indicate a high dispersion of data

6 Australian Diabates Trends By Sex

Australia_summary <- data_full %>%
  filter(`Country/Region/World` == "Australia")

Figure_3 <- ggplot(data = Australia_summary, aes(x = Year, y = `Age-standardised diabetes prevalence`, col = Sex)) + 
  geom_point(alpha = 0.8) + 
  labs(title = "Figure 3: Men have Higher Risk of Diabetes", 
       caption = "geom_smooth(` using method = 'loess' and formula = 'y ~ x'",
       subtitle = "Mean Diabetes Prevalence Has Increased Over Time") + 
  xlab("Year") + 
  ylab("Mean Diabetes Prevalence") + 
  theme_minimal() + 
  geom_smooth()

Figure_3

Figure 3 shows a trend of increasing mean diabates prevalence over time. Men have a noticeably higher mean than women. There is a steep increase from 1980 to 2000 and then a plateau. Data was only available up to 2010. It is unknown whether the plateua begins to trend downards.

7 Mean Diabetes Prevalence by Country

Australia_table_summary <- data_full %>%
  filter(`Country/Region/World` %in% c("Australia", "Germany", "China", "South Africa", "United States of America")) %>%
  select(-ISO) %>%
  group_by(`Country/Region/World`, Sex) %>%
  summarise(`Mean diabetes prevalence` = mean(`Age-standardised diabetes prevalence`))
  
kable(Australia_table_summary, 
             caption = "Table 1: First Six Observations of the Full Diabetes Dataset",
             digits = 3)
Table 7.1: Table 1: First Six Observations of the Full Diabetes Dataset
Country/Region/World Sex Mean diabetes prevalence
Australia Men 0.064
Australia Women 0.047
China Men 0.060
China Women 0.061
Germany Men 0.056
Germany Women 0.040
South Africa Men 0.069
South Africa Women 0.097
United States of America Men 0.065
United States of America Women 0.054

Five random countries were selected to compare mean diabetes prevalence by year and sex. Table 3 presents that in Australia, Germany and United States of America, men have a higher mean diabetes prevalence than women. Mean diabetes prevalence for men and women in China are very similar with men being 0.001 higher. Interestingly, women in South Africa have a higher mean diabetes prevalence than men.

8 Conclusions